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Abstract. — 

Diversification is nested, and early models suggested this could lead to a great deal 
of evolutionary redundancy in the Tree of Life. This result is based on a particular set of 
branch lengths produced by the common coalescent, where pendant branches leading to 
tips can be very short compared to branches deeper in the tree. Here, we analyze 
alternative and more realistic Yule and birth-death models. We show how censoring at the 



present both makes average branches one half what we might expect and makes pendant 
and interior branches roughly equal in length. Although dependent on whether we 
condition on the size of the tree, its age, or both, these results hold both for the Yule 
model and for birth-death models with moderate extinction. Importantly, the rough 
equivalency in interior and exterior branch lengths means the loss of evolutionary history 
with loss of species can be roughly linear. Under these models, the Tree of Life may offer 
limited redundancy in the face of ongoing species loss. 

(Keywords: Phylogenetic tree, Yule process, extinction, phylogenetic diversity ) 



In a well-cited paper, Nee and May (1997) state that "80% of the underlying tree of 
life can survive even when approximately 95% of species are lost." This quote has 
percolated through the literature (see, e.g. ( |Erwin 2008 Purvis 2008 Roy et al. 2009 



Santos et al. 2010 Vamosi and Wilson 2008)). This high level of phylogenetic redundancy 



is due to Nee and May using coalescent-type models of tree shape, where pendant edges are 
expected to be much shorter than interior edges. Here, we test the robustness of this result 



by building on recent algebraic results from Steel and Mooers (2010) to derive the expected 
branch lengths on phylogenies produced under alternative Yule and birth-death models of 
diversification. We highlight three findings: (i) the average length of branches in pure-birth 
(Yule) trees is roughly one half our naive expectation; (ii) the expected length of the 
interior branches and those leading to species are the same or nearly so, and this means 
that (iii) the relationship between the loss of species to extinction and the loss of 



phylogenetic diversity (Faith 1992) can be much more precipitous than that quoted above 



(Nee and May 1997). All three findings hold for birth-death trees with low to moderate 
relative extinction rates. 

For much of what follows, we will consider a pure-birth Yule tree with diversification 
rate A. We note that inferred phylogenetic trees are often more imbalanced than Yule trees 



(Mooers and Heard 1997), but currently, no biological model captures this empirical 
distribution. More importantly for what follows, the Yule process produces a distribution 
of splitting events on the tree from past to present that is intermediate between that 



expected under an adaptive radiation (Gavrilets and Vose 2005 Rabosky and Lovette 



2008), where splits are concentrated nearer the root, that expected under long-term 



equilibrium models of diversification (Hey 1992; Hubbell 2001), where splits are 
concentrated nearer the present. Our main motivation for focusing on this model is that 
trees sampled from the literature tend to have splitting times concentrated nearer the root 



(McPeek 2008; Morion et al. 2010), making the Yule model a conservative model when 
measuring phylogenetic redundancy. 

We refer to branches that lead to the tips of a tree as pendant edges (with expected 
average length p n , where n is the number of tips) and branches found deeper within the 
tree as interior edges (with expected average length i n ). The term 'expected average 
length' clarifies that two random processes are at work - the production of a Yule tree and 
the selection of an edge from that tree. The expected phylogenetic diversity of such a tree 
is the sum of the expected pendant and interior edge lengths, i.e. L n = np n +{n-2)i n . We 
will assume throughout that the tree starts as an initial bifurcation, such that at some time 



t in the past it has two lineages each of length (as in Nee (2001)). After time t from the 



initial bifurcation, we produce a binary tree with n tips (as in [Nee| ( |200lj ); |Yang and 



Rannala (1997)), and several properties of this process have been well-studied by these and 
other authors. In particular, the expected number of tips in the tree is 2e At . 

Given rate A, the time that a given lineage persists until it splits on a Yule tree has 



an exponential distribution with a mean of y. This motivates our naive expectation that 
the expected average edge length on such a tree would also be j. We first present a simple 
proof that the expected average edge length in a Yule tree is actually This provides an 
underlying intuition that is absent from the purely algebraic proof of Steel and Mooers] 
(2010). We then summarise and extend some results from Steel and Mooers ( |2010 ) to 
describe how the relative lengths of pendant and interior edges are affected by (i) 
conditioning on, (ii) estimating, or (iii) not knowing, three related quantities: n, the 
number of tips of the tree; t, the depth of the tree; and A, the diversification rate. We then 
further extend our results to birth-death trees, and finally revisit the provocative question: 
at what rate do we lose phylogenetic diversity as we lose species on a tree? 



Expected length of a branch on a Yule tree 
sampled at the present 

Let us assume that we observe a Yule tree at the moment that it has grown to n + 
1 tips (n = 4 in Fig. 1). We do not condition on its depth (£). We can designate the edge 
that has just split as an interior edge, and disregard the two zero-length branches that have 
just arisen. Doing so designates an equal number (n — 1) of interior and pendant edges on 
this tree. One might think of this Yule tree as one that has been 'cut at' (or conditional 
on) the observation of n + 1 tips. Intuitively, even though the expected length of an edge 
on an uncensored tree would be h, the designated pendant edges will be shorter due to this 
conditioning. However, interior branches are also affected by this censoring: particularly 
long interior branches would stretch to the present, and so would be pendant edges. This 
means that the expected lengths of interior edges are also shorter than ^ . 

Theorem 1. In a Yule tree, at the latest speciation event, the expected length of a 
randomly drawn edge is 



Proof: Consider the late sampling scenario described in the preceding paragraph, 
and let the n — 1 remaining pendant edges each grow under the Yule process until they also 
split, disregarding all the new infinitesimal edges that result. Each of these grown pendant 
edges has an expected length g n and is made up of two segments - its expected length 
before the tree had n + 1 edges (= and its expected length as it continued to grow after 
the tree had n + 1 tips (= p a ), such that g n = Pb + Pa- Importantly, given the memoryless 
nature of exponential processes, the length of any pendant edge segment observed from the 
time that n + 1 tips are produced (the dashed lines in Fig. 1) is drawn from one common 
exponential distribution, with the same parameter A. Also, p n on the censored tree = pb on 
the uncensored tree. 

Given an equal number of interior and pendant edges on this uncensored tree, we 
can write an expression for the expected length (call it E[L]) of any randomly drawn edge 
on this tree as: 

= \-ln + \-{Pb+Pa) = \-ln+\-{Po+\)- (1) 

Any single lineage has E[L] = jk and so we can substitute this for E[L] to obtain: 
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2- Z "+2' P6= 2V (2) 

because p n = p\>. The left member in equation ^ is the expected length of a randomly 
drawn edge in the censored Yule tree, which completes the proof. 

This proof does not say anything about the relative lengths of internal vs. pendant 
edges per se - it might be that internal edges are still much longer than pendant ones on 
Yule trees that we observe at a single time slice, and it may be that the result hinges on 
observing the tree at exactly the moment that a speciation event occurs. We turn to these 
issues now. 
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Figure 1: Growing a Yule tree, to illustrate the proof of Theorem [lj The horizontal line is 
the observation time, when n+1 tips first appear. Below this line is the censored tree whose 
edge lengths we are modelling. The uncensored tree has each pendant edge continuing to 
lengthen till it speciates in turn. The thick lines denote interior branches, the thin lines are 
pendant edges on the censored tree, and the dashed lines are the segments that accrue to 
produce the uncensored tree. 



Expected pendant vs. interior edge lengths as 

FUNCTION (ONLY) OF n 



In the above construction of the Yule tree we made the convention that the edge 
that has just split is an interior edge of the resulting tree. However, we could have 
alternatively classified it as a pendant edge. In that case we have n pendant edges and 
n — 2 interior edges, and one can again consider the expected average pendant and interior 
branch lengths, which we will denote as i' n and p' n . Steel and Mooers (2010) used a 
recursive argument to establish the following exact result: For all n > 3, we have: 




(3) 



This result tells us exactly how is shared out between the two terms in Theorem [TJ 
Due to the memoryless nature of the exponential distribution, the pendant edge that was 
chosen to split at our observation time is random with respect to its length, and so we can 
express the lengths of the interior and pendant edges on the censored tree as: 

i n = —r((n - 2)i' n + p' n ), and p n = -^—(np' n - p' n ). 
n — 1 n — 1 

Eqn. ([3]) implies that, for all n > 3: 

1 ,1 



l n — l n 



2A and p n = p' n 2A . 



In particular, the terms i n and p n in Theorem [T] are equal. We note that Theorem [T] is for a 
late sampling scenario, when we show up just when n + 1 tips first appear. However, if we 
only condition on n, but show up at a random time between the interval when n and n+1 
tips exist (i.e. if we 'show up' at the present to sample our tree), any pendant edge has the 
same expected average length as in the late sampling scenario. This result is analogous to 
the bus-stop problem: if buses arrive at a certain rate b under an exponential process, if 
one shows up at a random time, the expected time since the last bus is b~ l rather than 



something less than that. This property was formally proven for model trees (Gernhard 



2008a) and also used recently by Hartmann et al. (2010) in the context of sampling trees 



from evolutionary models. 



Expected pendant vs. interior edge lengths as 

FUNCTIONS OF t (ALONE OR WITH n) 

The expected number of tips in a Yule tree at time t is given by N(t) = 2e xt , since each of 
the two initial lineages has a geometrically distributed distribution, with a mean of e xt (see 



e.g. |Nee et al.| Jl994| ), or |Beichett and Fatti| ( [2002] ) (Example 6.10, pp. 193)). We now 
introduce P as the sum of all pendant edges, I as the sum of all interior edges, and, as in 
the introduction, L as the total tree length, L=P+I. These quantities, conditional on 
either n or t or both, should be noted, as they will be useful for many of the proofs that 
follow. If we let P(t) and I(t) denote, respectively, the expected sum of the lengths of the 
pendant and interior edges of a Yule tree grown for time t and let Lit) = P(t) + I{t), then, 



from Steel and Mooers (2010), we have the following equalities: 



L(t) = % xt - 1); P(t) = \(e xt - e~ xt ) and I(t) = \(e xt + e~ xt - 2). (4) 
A A A 

Thus the ratio of the expected average lengths of the pendant and interior edges of 
a Yule tree of depth t converges to 1 exponentially fast with increasing t. P(t) is slightly 
larger than than I(t), but the difference becomes rapidly negligible. In particular, the ratio 
P(t)/L(t) converges quickly to 1/2; we will consider this ratio further when we allow for 
extinction. 

Importantly, for most phylogenetic trees, both n and t will be known from the data. 
Do the observations on edge lengths made above also hold when we condition on both n 
and i? The expected total length of a Yule tree conditional on it having grown for time t 
and having exactly n tips at time t is given by: 

L n (t)=t- (2 + r ^{l-y(x)Yj, (5) 

where x = \t and y(x) := ^ - x , which is a function that decreases from 1 towards as 
x > grows (for details, see Steel and Mooers (2010)). Let I n (t) and P n (t) denote the 
expected sum of the interior and pendant edge lengths (respectively) of a Yule tree, 



n = 


P n {t) 


L n (t) 


Rn{t) 


A 


3.03296 • t 


2.8854 • t 


1.0511 


16 


3.8697 -t 


6.7326 • t 


0.5748 


64 


9.2373 • t 


17.8894 • t 


0.5163 


256 


26.3815 -t 


52.3492 • i 


0.5040 


1024 


82.0735 • t 


163.8260 • t 


0.5010 



Table 1: Sum of pendant edges (P n (t)), sum of all edges (L n (t)) and their ratio (R n (t)) for 
various tree sizes n when both n and t are fixed and A is set to its maximum likelihood value. 



conditional on it having grown for time t and having exactly n tips at time t. Thus, 
I n (t) + P n (t) = L n (t) (given by Eqn. (§). 

A proof of the following result is provided in the Appendix. 

Theorem 2. The expected length of a randomly picked pendant edge in a Yule tree on n 
extant species and of age t is, 

1 / 2 (n - 2) [(n + 5) - 4(1 + n + 2x)e~ x + (3n - 1 + 2(n + l)x)e~ 2x ] 

n n ^ ~ ^" Vn(n - 1) + 2xn(n - 1)(1 - e~ x ) 2 

where x = Xt. 

In particular, if we set A to its maximum likelihood estimate, i.e. Xml = l°g(f)A 



(Magallon and Sanderson 2001), then the ratio R n := P n (t)/L n (t) of the expected total 
length of the pendant edges to the expected total length of all edges in a Yule tree on n 
extant species and age t is independent of t and is given by: 

- n 3 - 3n 2 - An log(n/2) + An - 4 
n= 2(n-l)(n-2) 2 ' 

which tends to 1/2 as n — )■ oo. 

Table 1 presents P n (t), L n (t) and their ratio R n {t) (ie, -R n (t) conditioned on A 
taking its maximum likelihood estimate) for a range of tree sizes. 



Extension to birth-death models 



Allowing for random extinction (as well as speciation) introduces additional complexity 
into the analyses presented above. We first consider what happens if we condition just on n 
(and adopt the assumption that the time of origin of the initial linage is a parameter of the 
birth-death model). To do this, we have to assume a prior distribution for the time of 
origin when conditioning the trees to have n extant species. We make the common 
assumption that the first species originated at any time in the past with uniform 



probability (Aldous and Popovic 2005). This is also called an improper prior on (0, oo). 
Conditioning the resulting tree to have n extant species yields a proper distribution for the 



time of origin (Gernhard 2008a). Note that, under the Yule model where \i = 0, this 



scenario is equivalent to stopping the process just before the n + 1-th speciation event 



(Hartmann et al. 2010), which is the setting we considered in the first two sections of this 



paper. The following result generalizes those earlier findings to birth-death models (a proof 
is provided in the Appendix). As usual, A is the per lineage speciation rate and fi is the 
per-lineage extinction rate. 

Theorem 3. The expected length of a pendant edge on a birth-death tree conditioned on n 
is, for < ji < X, 



E[p|n] 



li + (A - //) log(l - fi/X) 



for \i = X, we have: 



and for \i = 0, we have: 



E\p\n) 



E[p\n] 



X' 



2A' 



(6) 



We can also obtain exact results for the lengths of the edges in a birth-death tree if 
we condition (just) on time. In particular, we can provide extensions to equation (|4])to 



allow for extinction. We begin, as usual, with two lineages of length 0. Let T R (t) denote 
the tree that is spanned by those taxa that are extant at time t; T R (t) is therefore referred 
to as the 'reconstructed' birth-death tree (the tree consisting of edges that survive to time t 
while extinct lineages are pruned away) ( |Nee et ab 1994 Gernhard 2008a). If there are no 
taxa extant at time t, we say that T (t) is empty. Let N R (t) denote the expected number 
of tips in the reconstructed birth-death tree, given by the well-known formula: 



N R (t) = 2e {x -^ )t ,t> 0. 



Note that although N R (t) tends to infinity as t grows when A > fi, it is quite possible that 
the actual number of lineages at time t is 0, in which case T R (t) is empty. Let L R (t) be the 
expected total length of the reconstructed birth-death tree, and let P R (t) be the expected 
sum of the pendant branch lengths of this tree. The proof of the following result is 
provided in the Appendix. 



Theorem 4. Consider a birth- death tree with speciation rate A > and extinction rate \i 
that starts from two lineages of length 0. Let p=^,r = A — p and let f p {s) — pe ~ 1 
Then, for t > 0: 



(i) L R (t) 



(ii) P R (t) 



2e r 



2e rt 



(In f P (rt)). 

l-(p-l). [(ln/ p (rt))+ i 



(iii) For p > 1, the limiting ratio r p := lim^oo jjMl is given by: 



In 



p + 1. 



The function r p from part (iii) is shown in Fig. 2. Note that the 0.5 asymptote 
agrees with the ratio of P R (t) and L R (t) as in the pure-birth model as calculated earlier 
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Figure 2: Graph of t p , which is the limiting ratio (for large t) of the sum of pendant edge 
lengths to the sum of all edge lengths in a birth-death tree, in which the speciation rate is 
p > 1 times the extinction rate. 

(i.e. r p = as p —>■ oo). Interestingly, the asymptote is reached fairly quickly on large 
trees. For example, from Fig. 2, we see that when the extinction rate is one-third of the 
extinction rate (p = 3), then r p = 0.47 and the expected pendant edge length is 87% the 
expected interior edge length. Mild extinction in a uniform birth-death model does not 
produce particularly short pendant edges. At the other extreme, as the extinction rate 
approaches the speciation rate (so r and p converge to and 1 respectively) r p can be 
easily shown to converge to 0, as suggested by Fig. 2. It is interesting to note that the 
expected sum of pendant edge lengths in the reconstructed tree at time t (i.e. P R {t)) 
divided by the expected number of extant taxa at time t (i.e. 2e ( - x ~^ t ) converges to the 



same expression as given in Eqn. (16]) as t — > oo. 



Expected PD under simple Field-of-Bullets model 

for Yule trees 

The expected lengths of edges in a tree are directly relevant for quantifying the 
expected loss of 'phylogenetic diversity' (PD) under simple models of extinction in which 
each tip is deleted with some fixed probability. In these models, edges that are 'deep' 
within the tree are more likely to contribute to the PD score of the surviving taxa than 
pendant edges of similar length, since they are more likely to have at least one non-extinct 
taxon in the clade they support. This redundancy leads to the nonlinear decrease of PD as 



more species are removed from a tree (Nee and May 1997). However, the ratio of the 



lengths of pendant to interior edges is also critical, as pendant edges will be the first to be 
deleted from the tree. In this section, we analyse the expected PD score of a Yule tree 
under random taxon deletion. Note that there are two random processes at play here: the 
Yule process that produces the tree, and then the extinction process that deletes taxa. 

Consider then a Yule tree that starts with a split into two lineages at time and is 
grown until time t > 0. At that time, each tip is selected independently with probability s, 
and the remaining tips are deleted (pruned). Thus s is the 'survival probability' of a taxon. 
Let if)t(s) be the PD of the resulted pruned tree, and let nt(s) = K[ip t (s)}, where E[.] 
denotes expectation with respect to the random Yule tree and the random pruning 
operation. Thus, 7r t (l) is the expected PD of the (entire) Yule tree, namely 
L(t) = |(e A * — 1) (Eqn. |4j). For s < 1, itt(s) is the expected PD one obtains by generating 
a Yule tree until time t and then applying a field-of-bullets pruning with survival 
probability s for each tip. The proof of the following result is provided in the Appendix. 



Theorem 5. 

Ms) = tj^t^ • [- log ( s + (i - s ) e ~ xt )} ■ 

The ratio ir t (s) / 7T t (l) of the expected PD in the pruned tree to the expected PD of the total 
tree therefore converges (quickly) with t to the limit: 

-slog(s) 

7r(s) 



1-5 



Theorem [5] implies that ir t (s) > s ■ vrt(l) for all t > 0. Moreover, the limiting ratio 
7r(s) is a continuous and concave, positive function that approaches as s — > and 
approaches 1 as s — » 1 (see Fig. 3). For s = 0.5,7r(s) = log(2) = 0.69. The slope function 
7r'(s) approaches infinity as s approaches from above and vr'(s) approaches \ as s 
approaches 1 from below. This latter result can be seen by considering that pendant edges 
are the first to be lost from a tree undergoing extinction; under the Yule model, the sum of 
the pendant edges constitutes 0.5 of the total PD (Theorem p}. 



The high level of redundancy reported by Nee and May (1997) is due to their use of 
coalescent-type models of tree shape with a constant population-size, where the pendant 
edges are expected to be much shorter than the interior edges. More precisely, the ratio of 
the expected total length of the pendant edges to the expected total length of the interior 



edges converges to with increasing n, at a rate 1/ log(n), see e.g. (Fu and Li 1993) (Eqns. 
(10-12)). An example of the relationship between s and the proportion of the tree 
remaining under Nee and May's model (for n = 1000) is shown in Fig. 3. 

Under a Yule model, where interior and pendant edges have roughly the same 
expected length, the situation is quite different. If we take s=0.05, then 7r(s) = 0.157. 
That is, in a large tree, if we lose 95% of species (randomly) then we would expect to lose 
more than 84% of the tree. This lower level of redundancy is also more in line with 



statistical (Morion et al. 2011) and empirical estimates of tree loss under extinction 
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Figure 3: Lower solid line shows the proportion of PD remaining when random extinction 
occurs with probability 1- s on a Yule tree (from ir(s) from The orem [5]) . The d otted line 
shows the same quantity for the coalescent-style tree used by (Nee and May 1997), for 



n=1000. The curve in between these two shows the same quantity but on a birth-death tree 
with /i = 0.5 A, as described by Eqn. (Pfl). 



regimes (von Euler 2001 Purvis et al. 2000 Vamosi and Wilson 2008), where tree shape 



and non-random extinction interact (see also (Heard and Mooers 2000 Nee 2005)) 



Similar results hold with birth-death trees under mild extinction (see Fig. 2), where 
the sum of the pendant edges constitutes r p of the total PD. In particular, for A > fi > 0, 
the second formula presented in Theorem [5] can be modified as follows (see Appendix): 



7T IS 



s j / s\ 1 — a 
(a — s) \a/ log(a) 



(7) 



where a = 1 — /i/A. 



Fig. 3 exhibits an example curve tt(s) on a birth-death tree constructed with u 



0.5A. 



We note that this modified formula for /i > should be used with care for larger 
values of \i for two reasons. Firstly, birth-death trees are increasingly likely to die out as \i 
approaches A and so an asymptotic ratio of expected values such as 7r(s) may be a poor 
estimate of expected PD loss in such situations. Note in particular that in the limit as 
fi/X — > 1, we have 7r(s) = 1 for all s > 0. This of course does not mean that if 99.9% of the 
taxa are eliminated, then we would still expect to retain 100% of the phylogenetic diversity! 

The second reason for caution is more empirically based. In the extreme (critical) 
case where [i = A then, as we have noted already, if we condition on a tree having n extant 
leaves (assuming a uniform prior distribution for the time of the origin of the tree, as in 



(Aldous and Popovic 2005)), then the expected distribution of branch lengths in this tree 



would be precisely that given by the coalescent process (Gernhard 2008b) that was used in 



the analysis by Nee and May (1997) . The problem now is that typical species-level 
phylogenetic trees look very different from such constant-size coalescent-shaped trees. |Hey| 



(1992), using a sample of only eight trees, was the first to point out that the coalescent 



model produced unreasonably short pendant edges (see also Morion et al. (2011), while 



McPeek's recent compilation McPeek (2008) of 245 fairly- well sampled chordate, 
arthropod, mollusk, and magnoliophyte phylogenies, showed that these trees tended to 
have a branch length distribution in the opposite direction to the coalescent, with edges 
near the leaves tending to be, on average, slightly longer than expected under the Yule 



model. McPeek used the gamma statistic from (Pybus and Harvey 2000) to describe the 
distribution of branch lengths as one moves from the root of the tree to the tips, and found 
that the majority of trees had negative gamma values, rather than having them centered 



on as expected under the Yule model (Pybus and Harvey 2000) and the positive values 



expected under the coalescent (Pybus et al. 2002). Indeed, we show in the appendix that 



the expected value of gamma for a coalescent tree of increases indefinitely at a rate of V3n. 



Morion et al. (2010) applied a coalescent framework that allows for incomplete 



taxon sampling to an overlapping set of 289 trees and found that the majority of trees 
(> 80%) had splitting times that were either consistent with the Yule model or 



concentrated nearer the root. Though nonrandom sampling may be a concern (Cusimano 



and Renner 2010), the observation that most nearly-complete phylogenetic trees have 



gamma values close to zero (or negative), as well as the explicit test of the Yule model by 



Morion et al. (2010) suggest that our use of this model in analyzing expected loss of PD 



may be conservative. 



Conclusion 



Although the Yule model of diversification is nearly 100 years old, it still holds some 
surprises. The fact that real trees are conditioned on t and that we show up at some 
random time after n tips have been produced leads to the observation that average 
pendant edge lengths (species ages) and internal edge lengths (those that anchor higher 
clades) are expected to be nearly equal under the Yule model. Although all edges are not 
expected to be the same length - for instance the two edges incident to the root are longer 
than others (results not shown) - this conditioning also makes randomly selected edge 
lengths one half of the naive expectation. These observations may be useful in informing 
prior distributions on edge lengths for tree inference. 

Mild amounts of uniform extinction do not change these general observations. 



Indeed, the 'push of the past' (Harvey et al. 1994 Phillimore and Price 2008), which 



describes the expectation that those groups which diversified faster than expected early on 
are more likely to be sampled in the present, would lead to internal edges being even 
shorter relative to pendant edges. Non-uniform models, such as adaptive radiations where 



diversification actually slows down through time (Rabosky and Lovette 2008 Morion et al. 



2010), would do the same. All these processes work against the redundancy inherent in the 



Tree of Life. We predict that this redundancy may not be as great as hoped for. Of course, 
this prediction must await more complete, dated trees. 
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Appendix: Proofs of Theorems 



Proof of Theorem [2| 
We can modify the argument that leads to the differential equation = XP(t) 



from (Steel and Mooers 2010) so as to take into account conditioning on n as well as t - 



the analysis consists of calculating quantities such as F[X t = n — l\X t +s = n), where X t 



denotes the number of species present at time t, for which Eqn. (4) of (Nee 2001) is 
helpful. In this way one can derive the following sequence of first-order linear differential 
equations for I n = I n (t): 

din , A(n-2) _ A(n - 2) / J_ 
dt + 1 - e-x n ~ 1 - e~>* ' 1 n- 1 n_1 



Notice that the term P n -i = P n -i(t) on the right-hand side of (J8J) can be replaced 
by L n _i(t) — J n _i(t) (with L n _i(t) given by ([H])). Moreover, when n = 2 we have the initial 
solution = (and P^it) = 2t) for all t > 0, and for each n we have the boundary 
condition I n (t) — at t — 0. 

It can now be verified that the expression given in Theorem [2] for P n (t) satisfies this 
system of linear differential equations subject to the boundary condition, and so is the 
unique solution. 

For the second claim if we set A to its maximum likelihood estimate, i.e. 
Xml = log(f)/t, then, 



(n - 2) [(n + 5) - 4(1 + n + 21og(n/2))e" lo sW 2 ) + (3n - 1 + 2(n + 1) log(n/2)) e - 21og (™/ 2 )] 

2A(n-l)(l-e- lo g(«/ 2 )) 2 
2 log(n/2) (n - 2) [(n + 5) - 4(1 + n + 2 log(n/2))2n- 1 + (3n - 1 + 2(n + 1) log(n/2))(n/2)- 2 ] 
A(ra- 1) + 2A(n - 1)(1 - 2/n) 2 

n 3 - 3n 2 - 4n log(n/2) + An - 4 
21og(n/2)(n-l)(n-2) 



The sum of all edge lengths is in expectation (Steel and Mooers 2010), 
L n {t) = t l0 g(~/ 2 ) ) an d therefore the ratio R n is the expression given Theorem 2j From this 
expression it is easily seen that lim^oo = 1/2. 



□ 



Proof of Theorem [3] 



The probability v(k) that a leaf is attached to the kth speciation event in a tree on 



n extant species under the Yule or birth-death model is, from Stadler (2008), given by: 



v{k) 



2k 



n(n — 1) 



(9) 



For < n < A, let: 



Po(t) 



(1 - e -( A ^)*) 



and p\ (t) 



- ii^p-^-aO* 



(A-/z) 2 e 



(A - /ie-( A -^) s 



while for \x = A, let: 



Po(t) := 



t 

1 + Xf 



and pi (i) 



1 

(i + xty 



The probability that a lineage produces (resp. 1) offspring after time t is HPo(t) (resp. 



Pi(t)) Kendall (1949). We first establish the following result: 



Lemma 6. The length of a randomly picked pendant edge in a birth-death tree on n extant 
species has probability density function f p (t\n) = 2Api(i)(l — \p (t)). 

Proof. For proving the lemma, we will use the probability density of the time of the k-ih 



speciation event in a birth-death tree with n extant species which is derived in Gernhard 



(2008a), and for /j < A, we get, 



, n—k—l 



/„,« = (* + 1) J " ^'^" ^^l y^ 



(10) 



Using Equation (\9\j and (10), we can write, 



n-l 



f P (t\n) = J2 v ( k )f^(t) 

k=l 



'„ _n\ (l- e -(A-fi)A n - fe - 1 

11 Z '\\n-k/\ \fc+2 e -(A-ft)(fc+l)t I 1 C ^_ 

w (A - pjer^-rt 



-2(A- /t )t 

2A(A-/i) 3 



(A-^e-( A -^*) 3 ' 

For /i = A, we take the limit /i — ?• A (using the property e~ e ~ 1 — e), which establishes the 
lemma. □ 

Note that the length of a pendant edge is independent of n. Theorem [3] now follows 
directly from Lemma M by evaluating J °° tf p (t\n)dt. □ 



Proof of Theorem \^ 



The quantity jj^ = A _ At r e _ rt is the probability that a birth-death tree that starts 



with a single lineage at time has at least one extant lineage at time t (Eqn. (2) of (Nee 



et al. 1994)). Thus, by considering the first 8 period of time in a birth-death tree that 



begins with a single lineage, the expected total sum S(t) of branch lengths spanning the 
leaves present at time t satisfies the differential expression: 



S{t + 5) = • fj.5 + 2S(t) ■ X5 + (S(t) + ■ (1 - (ji + X)5) + 0(5 2 ), 

(by considering whether or not the lineage becomes extinct, speciates, or persists 
unchanged within this initial 5 period). Since L R (t) = 2S(t) this leads to the following 
differential equation: 

d -^=rL R (t) + 2/f p (rt). (11) 
Solving Eqn. (11) subject to L R (0) = 0, gives part (i) of the Theorem. By considering the 



evolution of the tree from time t to t + 5 a straightforward dynamical argument leads to a 
second differential equation that links L R (t) to P R (t) : 

dL R (t) , „ 



dt 



N (t) — [lP {i). (12) 



Part (ii) follows by equating the right-hand sides of Eqns. (11) and (12) to express P (t) 
in terms of quantities already determined. For Part (iii), observe that r > and 
f p (rt) — > (p — l)/p as t — > oo and so, from parts (i), (ii), we have the asymptotic 
equivalences L R (t)/2e rt ~ p- 1 ln[p/(p - 1)], P R {t)/2e rt ~ pr l {l - (p - 1) ln[p/(p - 1)]). 
Taking the ratio of these quantities gives the result claimed. 



Proof of Theorem [5] 



Let 4>t — 0t( s ) be the analogue of ipt(s) if we start the Yule tree with a single (rather 
than 2) lineages at time t = 0; thus, 



7r t (s)=E[^(s)] = 2E[0 t (s)], (13) 

(the behaviour of is slightly easier to analyse than ip). Let X t denote the number of tips 
in the Yule tree (starting with a single lineage at time 0) at time t. Consider <fit+s, fo r a 
small value 5 > 0. In the first 5 period of time the initial lineage can either (i) speciate 
(with probability \5 + 0(5 2 )) or (ii) fail to speciate (with probability 1 — X5 + 0(5 2 )) and 
so we have: 



h+5 



where 



Jj + $ + 0(6), with probability X5 + 0(5 2 ); 



P t +Y tl with probability 1 - \5 + 0(6 2 ) 



E[Y t \X t+s = n\ = 5 ■ (1 - (1 - s) n ), 



(14) 



and (ft, 4>\ and 4> 2 are independent random variables having the same distribution as 



n 



(the contribution of 5 to the PD score of the tree applies precisely if at least one of the tips 
at time t + 5 is sampled, and this event, conditional on X t +s = n, has probability 
1 - (1 - s) n ). Now, 

F(X t+5 = n\X 5 = 1) = F(X t = n\X = 1), 

and it is a classic result that this latter probability has a geometric distribution with mean 
e xt (see e.g. Beichett and Fatti (2002), Example 6.10, pp. 193) and so: 



\ n>l 



E[Y t ] = 6 ■ (1 - E[(l - s) x <]) = 6 ■ ( 1 - V(l - s)"e- Ai (l - e^)- 1 I = (15) 



s + qe 



where q 



1 — s. Let vr^(s) 



E[0 t (s)]. Taking expectation of (14) (with respect to both 



the Yule tree and the random sampling process) and applying ( 15 ) leads to the following 
differential relationship for vrj(s): 



tt' (s) = 2X5 ■ 7r' t (s) + (1 - X5) ■ U(a) + 



5-s 



s + qe 



-xt 



0(6 2 



This leads to the following first-order, linear differential equation for n' t (s) 



dt n ; 



s + qe 



-xt' 



Solving this equation gives 7r£(s), and thereby the stated value for im(s) = 2n' t (s) (by (13)). 



The modification of this result to give Eqn. ([7]) in the birth-death setting, with 
< fx < A following a similar case analysis (but allowing for the possibility of extinction) 
leads to the differential equation for M t (s) = E[0 t (s)]: 



dM t {s) 
dt 



(\-n)M t (s)+F((f> t (s)^0). 



(16) 



Now, by Eqn. (1) of (Yang and Rannala 1997) (or see (Nee et al. 1994)) we have: 



t(s) * 0) 



OS 



s-(s- a)e-( x ~^ 1 ' 



(17) 



where a = 1 — /i/A. Now 7i"t(s) lies between 2M t (s) and 2M t (s) — t (depending on whether 
we add the lengths of all the edges from the extant taxa to the root, or just the edges from 
the extant taxa to their most recent common ancestor), from which Eqn. ^ follows by 
evaluating the limit of the ratio 7r t (s)/7r t (l) as t — > oo. 

□ . 



The expected value of gamma under the coalescent process 



Under a Yule (pure-birth) model, the gamma statistic has a standard normal 
distribution with mean 0, while under a coalescent model it is positive. Under the 
coalescent model, the original 7 statistic grows at the asymptotic rate of yfn as the number 
of tips n grows. 

Theorem 6. For a coalescent tree with n leaves, ^f/\/n converges in 

probability to a/3 with increasing n. 

For a rooted binary tree with n > 2 leaves, let g%, g%, . . . , g n be times between 
successive speciation events, measured from the root to the leaves, and let T n = Y^j=2.39r 
From 



Pybus and Harvey (j2000b we have 7 = where X n can be written in the form: 



X n = — — ^ a i9h where a { = i(n/2) - 2 ( l J 

n i=2 ^ ' 



and 

K = T 

1 n J -n 



12(n - 2)' 

Now, under the coalescent, the random variables g 2 , ■ ■ ■ ,g n are independently distributed, 
and with gj having an exponential distribution with mean 7k-. It follows that 2 and 
lo ^^ have expected values that converge to 1, and variances that converge to as n — > 00, 
and so 2 and j^^y each converge in probability to the constant 1 as n — > 00. 
Consequently, the ratio X n /T n converges in probability to 1/2 as n — > 00, and so 
7(^^)/^/^^ = ^ • 2 ' > converges in probability to \/3, as claimed. 

Finally, a more careful asymptotic analysis provides a closer approximation to j/y/n 
by the formula a/3 • (1 — log (n)+c ) wnere C i s Euler's constant (0.5772...), and simulations 
confirm this improved fit. 

□ 



